Method and device for reducing the time required to perform a product, multiplication and modular exponentiation calculation using the montgomery method

ABSTRACT

The invention relates to a method for speeding up the time required to perform a Montgomery product calculation by applying the High-Radix Montgomery method on computing hardware. A loop of operations ( 72 ) is performed consisting in repeating successive operations, i.e.: a first addition operation ( 76 ) involving the addition of a value of one of several first products, designated &lt;o&gt;ai&lt;/o&gt;.&lt;o&gt;b&lt;/o&gt;, and a value of one variable, designated u, according to a first relationship u:=u+&lt;o&gt;ai&lt;/o&gt;.&lt;o&gt;b&lt;/o&gt;; and a second addition operation ( 80 ) involving the addition of a value of one of several second products, designated m.n, and a value of variable u according to a second relationship u:=u+m.n. At least the first and second addition operations are Carry-Save addition operations in order to speed up the time required to perform an addition.

[0001] The invention relates to methods and devices for speeding up thetime required to perform modular arithmetic operations, and moreparticularly a modular exponentiation, a modular multiplication and aMontgomery product on computing means.

[0002] A modular multiplication operation consists of carrying out thefollowing operation:

a.b mod n;

[0003] where a, b and n are integers, n being called the modulus.

[0004] In a conventional manner, in order to effect a modularmultiplication the computing means first of all carry out amultiplication of a by b, followed by modulo n reduction. The time forperforming this operation is proportional to k² where k is the number ofbits necessary in order to encode respectively a, b and n in binaryform.

[0005] In a manner which is equally well known to mathematicians, amodular multiplication can be carried out by the Montgomery method. Thismethod introduces Montgomery products as described in the document byCetin Kaya Koç, “High Speed RSA Implementation”, which may be obtainedfrom the following address:

[0006] RSA Laboratories

[0007] RSA Data Security, Inc.

[0008] 100, Marine Parkway, Suite 500

[0009] Redwood City, Calif. 94-65-1031

[0010] U.S.A.

[0011] In the following description this document will be referred to asD1.

[0012] A modular exponentiation operation consists of carrying out thefollowing operation:

x^(c) mod n;

[0013] where x, c and n are integers, n being the modulus.

[0014] The calculation of this exponentiation by known methods, such asfor example the “square and multiply” method, involves k modularmultiplications, k being the number of bits necessary in order to encoderespectively x, c and n in binary form. Thus it is assumed that the timefor performing this operation is proportional to k³.

[0015] The modular exponentiation operations constitute basic operationsof data encrypting/decrypting devices. For example, theencrypting/decrypting devices implementing the RSA(Rivest-Shamir-Adleman) use modular exponentiations.

[0016] These devices currently exist in various such as electroniccomponents or electronic cards intended to be associated with computingmeans in order to perform and/or to speed up the encrypting/decryptingoperations.

[0017] Electronic commerce, particularly on the Internet, uses a largenumber of these encrypting/decrypting devices in order to encrypt anddecrypt commercial operations such as payments. The turnover ofcompanies carrying out electronic commerce is therefore limited by thenumber of encrypting and decrypting operations which can be performedper second.

[0018] Consequently it will be imagined that it is important to speed upthe time required for performing a Montgomery product calculation, amultiplication and a modular exponentiation on a machine equipped withcomputing means.

[0019] Therefore the object of the invention is to propose a method anda device for speeding up the time required to perform a Montgomeryproduct calculation, a modular multiplication and a modularexponentiation on a machine equipped with computing means.

[0020] The invention therefore relates to a method for speeding up thetime required to perform a Montgomery product calculation by applyingthe high-radix Montgomery method on computing hardware, the said methodcomprising a loop of operations consisting of reiterating successiveoperations, wherein in particular:

[0021] a first addition operation between a value of one of severalfirst products, denoted {overscore (a)}_(i).{overscore (b)} and a valueof a variable, denoted u, according to a first relationshipu:=u+{overscore (a)}_(i).{overscore (b)};

[0022] a second addition operation between a value of one of severalsecond products, denoted m.n, and a value of the variable u according toa second relationship u:=u+m.n;

[0023] characterised in that at least the said first and second additionoperations are carry-save addition operations in order to speed up thetime required for performing an addition.

[0024] According to other characteristics and advantages of theinvention, the method comprises:

[0025] in a loop of operations a third operation of division of thevariable u by a power of 2, denoted 2^(ω), where ω is the radix,according to a third relationship ${u:=\frac{u}{2^{\omega}}},$

[0026] characterised in that the variable u is registered in the form ofa carry-save ordered pair formed by two variables, denoted C and S, forperforming operations of the loop, and that the third operation ofdivision of the variable u in the form of a carry-save ordered pair iscarried out in two steps, namely:

[0027] a preliminary step of calculation and storage of a carry digit,denoted R_(e), which is at risk of being lost by the division of eachvariable C and S by the power of 2;

[0028] a step of division of each variable C and S by the power of 2;

[0029] the preliminary step of calculation of the carry digit R_(e)comprises the operation of adding in a conventional manner ω leastsignificant bits of the variable C, denoted C₀, to ω least significantbits of the variable S, denoted S₀, according to a fourth relationshipR_(e):=C₀+S₀;

[0030] a recombination of u on the basis of the variables C and S of thecarry-save ordered pair and of the carry digit R_(e) comprises theoperation of shifting to the right by ω bits the carry digit R_(e) andin a conventional manner adding the result obtained to the variables Cand S according to a fifth relationship u:=C+S+R_(e)/2^(ω);

[0031] it comprises at the end of performing the loop of operations:

[0032] a step of recombination (84) of the variable u on the basis of atleast the values of the variables C and S of the carry-save ordered paircalculated during the performance of the loop of operations, and

[0033] a step of reduction (86) of the variable u according to a sixthrelationship u:=u−n, where n is a modulus,

[0034] the said steps of recombination and of reduction of the variableu overlapping in such a way as to speed up the time required to performthem;

[0035] the radix ω is equal to 4 bits in order to optimise the timerequired for performing the calculation of a Montgomery product on theinput variables of the Montgomery product encoded on 512 or 1024 bits;

[0036] the first products {overscore (a)}_(i).{overscore (b)} arepre-calculated before performing the loop of operations; and

[0037] the second products m.n are pre-calculated before performing theloop of operations.

[0038] The invention also relates to a method of speeding up the timerequired to perform the calculation of a first and a second Montgomeryproduct by applying for each product a method including at least onefirst step during which the first addition operation for the firstproduct is carried out at the same time as the second addition operationfor the second product.

[0039] According to other characteristics and advantages of this methodfor speeding up the time required to perform the calculation of a firstand a second Montgomery product:

[0040] it comprises at least a second step shifted in time with respectto the first, during which the second addition operation for the firstproduct is carried out at the same time as the first addition operationfor the second product;

[0041] it comprises at the end of performing the loop of operations:

[0042] a step of recombination then of reduction for the first productperformed first; and then,

[0043] a step of recombination then of reduction for the second productperformed second;

[0044] one of the input variables of the first Montgomery productperformed first is made up of the least significant bits of a variable,and one of the input variables of the second Montgomery productperformed second is made up of the most significant bits of this samevariable.

[0045] The invention also relates to a method of speeding up the timerequired for performing a modular multiplication calculation by applyinga method implementing Montgomery products, characterised in that thecalculation of the Montgomery products is carried out by applying atleast one of the methods according to the invention.

[0046] According to other characteristics and advantages of this methodfor speeding up the time required for performing a modularmultiplication calculation:

[0047] the said method implementing Montgomery products is theMontgomery method.

[0048] The invention also relates to a method of speeding up the timerequired for performing a modular exponentiation calculation by applyinga method implementing modular multiplications, the calculation of themodular multiplications being carried out by applying a method accordingto the invention.

[0049] According to other characteristics and advantages of this methodof speeding up the time required for performing a modular exponentiationcalculation:

[0050] the said method implementing modular multiplications is the m-arymethod with a word size of r bits;

[0051] the word size r of the m-ary method is equal to 5 bits in orderto speed up the time for performing the m-ary method when inputvariables of the modular exponentiation calculation are encoded on 512or 1024 bits;

[0052] the second products m.n are pre-calculated before applying them-ary method;

[0053] the said method implementing modular multiplications is theChinese remainders method.

[0054] The invention also relates to a method of speeding up the timerequired for performing a first modular exponentiation calculation byapplying a method implementing second modular exponentiations, thesecond modular exponentiations being carried out by applying a methodaccording to the invention.

[0055] According to other characteristics and advantages of this methodof speeding up the time for performing the calculation of a firstexponentiation:

[0056] the said method implementing second modular exponentiations isthe Chinese remainders method;

[0057] it is applied to numbers encoded on more than 320 bits.

[0058] The invention also relates to a computer programme comprisingprogramme code instructions for performing certain steps of the methodaccording to the invention when the said programme is executed onprincipal computing means associated with the said computing hardware.

[0059] The invention also relates to a system for speeding up the timerequired to perform a Montgomery product calculation using thehigh-radix Montogomery method on computing hardware, the said systemcomprising:

[0060] means for effecting a first addition operation between a value ofone of several first products, denoted {overscore (a)}_(i).{overscore(b)}, and a value of a variable, denoted u, according to a firstrelationship u:={overscore (a)}_(i).{overscore (b)};

[0061] means for effecting a second addition operation between a valueof one of several second products, denoted m.n, and a value of thevariable u according to a second relationship u:=u+m.n,

[0062] characterised in that the means for effecting the first and thesecond addition operations include at least one carry-save adder;

[0063] according to other characteristics and advantages of this system:

[0064] the means for effecting the first and the second additionoperations include at least one first carry-save adder adapted to carryout the first addition operation and a second carry-save adder (158;232) adapted to carry out the second addition operation;

[0065] it includes conventional means for carrying out a third operationof division of the variable u by a power of 2, denoted 2^(ω), where ω isthe radix, according to a third relationship${u:=\frac{u}{2^{\omega}}},$

[0066] it includes means for storing the variable u in the form of acarry-save ordered pair formed by two variables, denoted C and S, andmeans for carrying out the third operation of division of the variable uin the form of a carry-save ordered pair comprising:

[0067] means for calculation and storage of a carry digit, denotedR_(e), which is at risk of being lost by the division of each variable Cand S by the power of 2;

[0068] means for division of each variable C and S by the power of 2;

[0069] the means for calculation and storage of the carry digit R_(e)include means for conventional addition of the ω least significant bitsof the variable C, denoted C₀, to the ω least significant bits of thevariable S, denoted S₀, according to a fourth relationship R_(e):=C₀+S₀;

[0070] it comprises:

[0071] means for recombination of the variable u at least on the basisof the values of the variables C and S of the carry-save ordered pair;

[0072] means for reduction of the variable u, the said means forrecombination of the variable u and the said means for reduction beingconnected to one another in such a way that operation thereof overlapsunder the control of the control means;

[0073] the radix ω is equal to 4 bits in order to optimise the timerequired to perform a Montgomery product calculation on input variablesof the Montgomery product encoded on 512 or 1024 bits;

[0074] it includes means for pre-calculation of the first products{overscore (a)}_(i).{overscore (b)};

[0075] it includes means for pre-calculation of the second products m.n;

[0076] the said means for pre-calculation of the first and/or the secondproducts include a conventional adder.

[0077] The invention also relates to a system for speeding up the timerequired to perform the calculation of a first and a second Montgomeryproduct, characterised in that it includes two carry-save adders whichare activated simultaneously;

[0078] According to another characteristic of the system for speeding upthe time required to perform the calculation of a first and a secondMontgomery product, it includes a single means for recombining thevariable u on the basis of at least the values of the variables C and Sof the carry-save ordered pair, connected to the input of a single meansfor reduction of the variable u.

[0079] The invention also relates to a system for speeding up the timerequired to perform a modular multiplication calculation by a methodimplementing Montgomery products, the said Montgomery productcalculations being performed on computing hardware, characterised inthat it includes at least one system for speeding up the time requiredto perform the calculation of the Montgomery products according to theinvention.

[0080] The invention also relates to a system for speeding up the timerequired to perform a modular multiplication calculation by theMontgomery method implementing Montgomery products on computinghardware, characterised in that it includes at least one system forspeeding up the time required to perform the calculation of theMontgomery products as claimed in one of claims 24 to 34.

[0081] The invention also relates to a system for speeding up the timerequired to perform a modular exponentiation calculation by a methodimplementing modular multiplications, characterised in that it includesat least one system for speeding up the time required to perform thecalculation of the modular multiplications according to the invention.

[0082] The invention also relates to a system for speeding up the timerequired to perform a modular exponentiation calculation by the m-arymethod with a word size of r bits implementing modular multiplications,characterised in that it includes at least one system for speeding upthe time required to perform the calculation of the modularmultiplications according to the invention.

[0083] According to another characteristic of the system for speeding upthe time required to perform the modular exponentiation calculation bythe m-ary method, it includes at least one register for shifting 5 bitsto the left in order to speed up the performance of the m-ary methodwith a word size of r bits of the m-ary method equal to 5 bits.

[0084] The invention also relates to a system for speeding up the timerequired to perform the calculation of a modular exponentiation by theChinese remainders method implementing modular multiplications,characterised in that it includes at least one system for speeding upthe time required to perform the modular multiplication calculationaccording to the invention.

[0085] The invention also relates to a system for speeding up the timerequired to perform the calculation of a first modular exponentiation bya method implementing second modular exponentiations, characterised inthat it includes at least one system for speeding up the time requiredto perform the calculation of the second modular exponentiationsaccording to the invention.

[0086] The invention also relates to a system for speeding up the timerequired to perform at least a first modular exponentiation calculationby the Chinese remainders method which itself implements second modularexponentiations, characterised in that it includes at least one systemfor speeding up the time required to perform the calculation of thesecond modular exponentiations according to the invention.

[0087] The invention also relates to an electronic component whichincludes at least one system according to the invention.

[0088] According to another characteristic of this component, it isformed with at least one FPGA.

[0089] The invention also relates to an electronic card which includesat least one system according to the invention.

[0090] According to another characteristic of this electronic card, itconforms to the PCI standard.

[0091] The invention also relates to a machine characterised in that itis associated with at least one system according to the invention.

[0092] The invention also relates to a method of speeding up the timerequired to perform the calculation of a first modular exponentiation,denoted M^(E) mod n, where M is the input message, E is the exponent andn is the modulus, on principal computing means, characterised in that itfurther comprises:

[0093] a first step of separating the calculation of the first modularexponentiation into two second modular exponentiations by applying theChinese remainders method,

[0094] a second step consisting of calculating each of the secondmodular exponentiations by applying the m-ary method which implementsmodular multiplications,

[0095] steps consisting of effecting the modular multiplications byapplying a method implementing Montgomery products.

[0096] According to other characteristics and advantages of this methodfor speeding up the time required to perform the calculation of a firstmodular exponentiation:

[0097] the input variables are natural integers encoded on more than 320bits;

[0098] the word size r of the m-ary method is equal to 5 bits in orderto speed up the time required to perform the m-ary method when the inputvariables of the calculation of the modular exponentiation are encodedon 512 or 1024 bits;

[0099] the calculations of the second modular exponentiations arecarried out substantially in parallel; and

[0100] the Montgomery products are calculated using thehigh-radix.Montgomery method.

[0101] The high-radix Montgomery method is implemented in accordancewith one of the methods according to the invention.

[0102] The invention also relates to a computer programme comprisingprogramme code instructions for performing certain steps of a methodaccording to the invention when the said programme is executed on theprincipal computing means.

[0103] The invention will be better understood upon reading thefollowing description which is given solely by way of example and withreference to the accompanying drawings, in which:

[0104]FIG. 1 shows the Montgomery method for carrying out a modularmultiplication;

[0105]FIG. 2 shows a method of calculating a Montgomery product in itshigh-radix form;

[0106]FIG. 3A is an electronic diagram of a carry-save adder;

[0107]FIG. 3B is an electronic diagram of a conventional adder;

[0108]FIG. 4 is an example of division of a number represented in theform of a carry-save ordered pair;

[0109]FIG. 5 shows a method of calculating a Montgomery productaccording to the invention;

[0110]FIG. 6 shows a method of modular exponentiation according to them-ary method;

[0111]FIG. 7 shows a method of modular exponentiation according to theinvention;

[0112]FIG. 8 shows the Chinese remainders method;

[0113]FIG. 8 is a schematic view of a Montgomery multiplier according tothe invention; and

[0114]FIG. 10 is a schematic view of a modular exponentiator accordingto the invention.

[0115] The following notations are used in the description whichfollows:

[0116] D2 denotes the following document: Cetin Kaya Koç, “RSA HardwareImplementation”, which may be obtained from the same address as thepreviously mentioned document D1;

[0117] := is the allocation symbol, thus X:=M signifies that the valueof a variable denoted M is allocated to a variable denoted X;

[0118] “dec” indicates that the digit which precedes it is in decimalnotation;

[0119] “FPGA component” refers to the known programmable component ofthe FPGA (field programmable gate array) type.

[0120]FIG. 1 shows the Montgomery method for carrying out a modularmultiplication between a first input variable denoted “a” and a secondinput variable denoted “b” according to the following relationship:

a.b mod n;

[0121] where a, b and n are natural integers, n being the modulus.

[0122] The following description of this method only presents theinformation necessary for an understanding of the invention. For furtherinformation the reader may refer, for example, to the document D1,chapter 3.8 “Montgomery's method”.

[0123] The modular multiplication according to the Montgomery method iscarried out in five successive steps number 2, 4, 6, 8 and 10 on FIG. 1.

[0124] The step 2 consists of calculating the variable n′₀ according tothe following relationship:

n′ ₀ =−n ₀ ⁻¹;

[0125] where:

[0126] the sign − represents the operation of complement to 1;

[0127] n₀ represents the ω least significant bits of the modulus n, ωbeing called the radix;

[0128] n₀ ⁻¹ represents the inverse of n₀ and is defined by therelationship n₀.n₀ ⁻¹=1 mod (2^(ω)), this equation being solved by knownmethods such as the extended Euclidean algorithm.

[0129] The significance of the calculation of n′₀ in this step willbecome apparent upon reading the description of FIG. 2.

[0130] In the second step 4 the Montgomery remainder of the inputvariable a, denoted {overscore (a)}, is calculated according to thefollowing relationship:

{overscore (a)}:=a.p mod n

[0131] where:

[0132] a is the first input variable of the modular product;

[0133] n is the modulus of the modular product;

[0134] p is defined by the following relationship: p=2^(k), where k isthe natural integer such that: 2^(k−1)≦n<2.

[0135] In the third step 6 the Montgomery remainder of the inputvariable b, denoted {overscore (b)}, is calculated according to thefollowing relationship:

{overscore (b)}:=b.p mod n;

[0136] where:

[0137] b is the second input variable of the modular product;

[0138] n is the modulus;

[0139] p is identical to the variable p defined in the second step 4.

[0140] In the fourth step 8 the Montgomery product between the remainder{overscore (a)} and the remainder {overscore (b)} is calculated and theresult is allocated to a variable {overscore (x)} according to thefollowing relationship:

{overscore (x)}:=MonPro({overscore (a)}, {overscore (b)});

[0141] where:

[0142] {overscore (a)} and {overscore (b)} are the remainders calculatedrespectively at steps 4 and 6;

[0143] MonPro represents the Montgomery product operation between thevariables {overscore (a)} and {overscore (b)}. This operation will bedescribed later with regard to FIG. 2.

[0144] In the fifth step 10 the Montgomery product between the variable{overscore (x)} and the unit is calculated and the result is allocatedto a variable x according to the following relationship:

x:=MonPro({overscore (x)},1);

[0145] where:

[0146] {overscore (x)} is the variable calculated at the fourth step 8;

[0147] 1 represents the unit;

[0148] MonPro represents the Montgomery product operation.

[0149] At the end of the five steps 2, 4, 6, 8 and 10 the result of themultiplication of the first variable a by the second variable b modulo nis obtained in the variable x.

[0150]FIG. 2 shows the Montgomery method in its high-radix form forcalculating a Montgomery product, also referred to here as thehigh-radix Montgomery method.

[0151] The following description of this method only presents theinformation necessary for an understanding of the invention. For furtherinformation the reader may refer, for example, to the document D2,chapter 7.5 “High radix Montgomery's method”.

[0152] The calculation of a Montgomery product corresponds to the MonProoperations of FIG. 1. This operation will be presented in the particularcase of step 8 of FIG. 1, that is to say that the following calculationis described here:

MonPro({overscore (a)}, {overscore (b)})={overscore (a)}.{overscore(b)}.p ⁻¹ mod n;

[0153] where:

[0154] {overscore (a)} and {overscore (b)} are the respective Montgomeryremainders of the variables a and b calculated at steps 4 and 6 of FIG.1;

[0155] p⁻¹ is the modulo n inverse of the variable p defined during thedescription of step 4 such that p⁻¹ satisfies the followingrelationship: p.p⁻¹=1 mod n.

[0156] This method has three principal steps 16, 18 and 20. The firststep 16 consists of initialising a variable u and an index i accordingto the following relationships: u:=0; i:=0. It also consists ofpre-calculating first products {overscore (a)}_(i).{overscore (b)} whichwill be defined with regard to operation 24 of this method.

[0157] The second step 18 consists of repeating a loop of operations aslong as the index i is not less than or equal to a variable s−1, theindex i being incremented at the end of each iteration of the loop. Thisloop of operations is denoted in a conventional manner “for i=0 to s−1”.The variable s which determines the number of iterations is defined hereby the following relationship:

k=s_(ω);

[0158] where:

[0159] k represents the number of bits necessary to encode the modulusn, that is to say that k satisfies the relationship: 2^(k−1)≦n<2^(k);

[0160] ω is the radix.

[0161] Thus if for example k=512 bits and if the radix ω=4 bits, s=128.

[0162] Moreover, if the division of k by the radix ω does not give anatural integer, it is possible to add to the binary representation ofthe modulus n most significant bits equal to 0 in such a way that thebinary representation of the modulus n thus obtained contains a numberof bits k′ which is a multiple of the radix ω.

[0163] The loop of operations 18 includes four successive operations 24,26, 28 and 30.

[0164] The first operation 24 of the loop of operations 18 consists ofcarrying out a first operation of addition and allocating the result tothe variable u according to the following relationship:

u:=u+{overscore (a)} _(i) .{overscore (b)};

[0165] where:

[0166] {overscore (a)}_(i) represents the ω least significant bits ofthe variable {overscore (a)} after a i^(th) shift to the right of ω bitsof the binary representation of {overscore (a)}, i corresponding to theindex i of the variable {overscore (a)}_(i);

[0167] {overscore (b)} represents the Montgomery remainder of the inputvariable b;

[0168] u is the variable initialised during step 16.

[0169] All of the values of the products {overscore (a)}_(i).{overscore(b)} when the value of the index i varies from 0 to 2−1 will be calledhereafter “the first products”.

[0170] The operation 26 consists of allocating to a variable m theresult of the multiplication of a variable u₀ by n′₀ modulo 2^(ω)according to the following relationship:

m:=u₀.n′₀ mod 2^(ω);

[0171] where:

[0172] u₀ represents the ω least significant bits of the variable upreviously calculated during the operation 24:

[0173] n′₀ is the variable calculated during the step 2 of the method ofFIG. 1;

[0174] ω is the radix.

[0175] The operation 28 consists of carrying out a second operation ofaddition then allocating the result to the variable u according to thefollowing relationship:

u:=u+m.n

[0176] where:

[0177] u is the variable previously defined;

[0178] m is the variable calculated during the operation 26;

[0179] n is the modulus of the modular multiplication of FIG. 1.

[0180] All of the possible values of the products m.n when the value ofm varies from 0 to 2^(ω)−1 will be referred to hereafter as “the secondproducts”.

[0181] The operation 30 consists of carrying out an operation ofdivision of the variable u by a power of 2, the allocating the result ofthe division to the variable u according to the following relationship:

u:=u/2^(ω)

[0182] where:

[0183] u is the variable previously calculated;

[0184] 2^(ω) is the power of 2, ω being the radix.

[0185] At the end of the loop of operations 18, the step 20 isperformed. This step consists of carrying out an operation of reductionif the value of the variable u obtained at the end of the loop ofoperations 18 is greater than n, n being the modulus. The reductionoperation consists of allocating to the variable u the result of thesubtraction u minus n according to the following relationship:

u:=u−n

[0186] where u and n are respectively the value calculated during theloop of operations 18 and the modulus of the modular multiplication ofFIG. 1.

[0187] It will be noted that the Montgomery method described in FIGS. 1and 2 transforms modulo n multiplications into modulo 2^(ω)multiplications. The modulo 2^(ω) multiplications are performed muchmore quickly on conventional computing means. However, it is known thatthis gain in speed at the level of the modular multiplications iscounterbalanced by the slowness of the calculation of the remainders{overscore (a)} and {overscore (b)} during steps 4 and 6 of FIG. 1.

[0188] The high-radix Montgomery method is currently used with a radixvalue equal to 8, this value corresponding to a byte (8-bit word).Surprisingly it was determined by tests that this radix value was notthe optimum for speeding up the time required to perform the calculationof a high-radix Montgomery product in the following conditions:

[0189] the calculation is carried out on large numbers. The designation“large numbers” is intended to mean natural integers encoded in binaryform on at least 320 bits.

[0190] the calculation is carried out by computing hardware. Thedesignation “computing hardware” is intended here to mean electroniccomponents, or sets of electronic components specially designed to carryout the calculation. Polyvalent computing means, such as a conventionalcomputer associated with a programme enabling this calculation to becarried out, are effectively excluded from this hardware.

[0191] The following tests were carried out for variables a, b and nencoded in binary form on 512 bits, that is to say for a value of thevariable k, previously defined, equal to 512 bits. The tests consist ina first step of designing hardware for calculation of a Montgomeryproduct according to the method of FIG. 2. In a second step the testconsist of determining the time required to perform a calculation of aMontgomery product according to the method of FIG. 2 on computinghardware designed during the first step and for the maximum operatingfrequency of this hardware. Thus it will be noted in the followingnumerical examples that the maximum operating frequency of the hardwaredecreases as the value of the radix ω increases. For the followingnumerical results the computing hardware is formed with a FPGA (fieldprogrammable gate array) component having the reference 10K200E-1. Inthese conditions the results obtained are as follows:

[0192] For a radix ω equal to 2 bits, the maximum operating frequency ofthe computing hardware is 66 MHz. The time required to perform aMontgomery product calculation according to the method of FIG. 2 is 8280nanoseconds.

[0193] For a radix ω equal to 3 bits, the maximum operating frequency ofthe computing hardware is 60 MHz. The time required to perform aMontgomery product calculation according to the method of FIG. 2 is 6447nanoseconds.

[0194] For a radix ω equal to 4 bits, the maximum operating frequency ofthe computing hardware is 50 MHz. The time required to perform aMontgomery product calculation according to the method of FIG. 2 is 5940nanoseconds.

[0195] For a radix ω equal to 5 bits, the maximum operating frequency ofthe computing hardware is 40 MHz. The time required to perform aMontgomery product calculation according to the method of FIG. 2 is 6475nanoseconds.

[0196] Therefore it will be appreciated upon reading the results ofthese tests that in order to optimise the time required to perform aMontgomery product calculation according to the method of FIG. 2 forlarge numbers encoded on 512 bits the radix must be chosen to be equalto 4 bits.

[0197] In a similar fashion it has been determined that a value of theradix equal to 4 bits also makes it possible to optimise the timerequired to perform the calculation of a Montgomery product according tothe method of FIG. 2 for large numbers encoded on 1024 bits.

[0198] There is another method in existence for calculating theMontgomery products which is known by the name of “Montgomery method inits simple form”. This method corresponds to the high-radix Montgomerymethod in the case where the radix is equal to 1 bit. Consequently thismethod will not be described in greater detail here, and it will simplybe considered that the high-radix Montgomery method also includes thecase where the radix is equal o 1 bit.

[0199]FIGS. 3A and 3B show an electronic diagram of a carry-save adderand an electronic diagram of a conventional adder.

[0200] On the diagrams A_(i), B_(i), D_(i), C_(i) and S_(i) denoterespectively the i^(th) bits starting from the right of the binaryrepresentation of variables A, B, D, C and S, the bit furthest to theright of each representation having an index i equal to zero.

[0201] The carry-save adder of FIG. 3A comprises three cells 40, 42 and44. These cells 40, 42 and 44 are respectively connected at the input tofirst means (not shown) for storage of the bits A₀, B₀ and D₀, the bitsA₁, B₁, and D₁ and the bits A₂, B₂ and D₂ of the input variables A, Band D. They are also connected at the output respectively to secondmeans (not shown) for storage of the bits C₁ and S₀, C₂ and S₁, and C₃and S₂ of the output variables C and S.

[0202] The cell 40 is adapted to calculate the value of the bit S₀according to the following relationship:

S ₀ :=A ₀ ⊕B ₀ ⊕D ₀

[0203] where:

[0204] A₀, B₀ and D₀ are input bits of the cell;

[0205] ⊕ represents the logical operation “exclusive OR”.

[0206] The cell 40 is also adapted to calculate the value of the bit C₁according to the following relationship:

C ₁ :=A ₀ .B ₀ +A ₀ .D ₀ +B ₀ .D ₀

[0207] where:

[0208] A₀, B₀ and D₀ are defined above;

[0209] + represents the logical operation “OR”;

[0210] . represents the logical operation “AND”.

[0211] In a manner similar to the cell 40 the cell 42 is adapted tocalculate the output bits C₂ and S₁ according to the following tworelationships:

S ₁ :=A ₁ ⊕B ₁ ⊕D ₁;

C ₂ :=A ₁ .B ₁ +A ₁ .D ₁ +B ₁ .D ₁.

[0212] In a manner similar to the cells 40 and 42, the cell 44 isadapted to calculate the output bits S₂ and C₃ according to thefollowing two relationships:

S ₂ :=A ₂ ⊕B ₂ ⊕D ₂;

C ₃ :=A ₂ .B ₂ +A _(2.) D ₂ +B _(2.) D ₂.

[0213] The operation which consists of calculating the output bits ofthe variables S and C as a function of the input bits according to thepreceding relationships is called a carry-save addition.

[0214] It will be noted that at the output of the carry-save adder, theresult of the addition of the three input variables A, B and D isregistered in the two output variables C and S, C and S forming what iscalled a carry-save ordered pair, denoted (C, S). In order to obtain theresult of the addition of the three input variables A, B and D in onesingle variable U, the variables C and S must be recombined according tothe following relationship:

U: C+S

[0215] where:

[0216] C and S are the variables of the carry-save ordered pair obtainedat the output of the carry-save adder;

[0217] + represents the conventional addition operation.

[0218] The method consisting of adding the bits of the input variablesaccording to the preceding relationships in order to obtain a carry-saveordered pair, then recombining the variables of the carry-save orderedpair in order to obtain the final result of the addition of the inputvariables is known by the name of “the carry-save method”. Thus thecarry-save method is made up of an operation of carry-save additionfollowed by an operation of recombination of the carry-save orderedpair.

[0219] The time required to perform the calculation of C₁ and S₀ by thecell 40 is denoted λ. It is assumed that the time required to performthe calculation of C₂ and S₁ and of C₃ and S₂ by their respective cells42 and 44 is also equal to λ. In these conditions the time required toperform the carry-save addition between the three input variables A, Band D is equal to λ. In fact the bits of the binary representations ofthe variables A, B and D are processed in parallel by the cells 40, 42and 44. This result can be generalised for carry-save adders includingnumerous cells, in such a way as to be able to carry out carry-saveadditions on large numbers as defined previously.

[0220] It will be noted that a carry-save adder can also be provided bysoftware such as a programme permitting processing in parallel of thecarry-save addition operations.

[0221]FIG. 3B shows a conventional adder adapted to carry out theconventional addition of two input variables A and B and to store theresult in an output variable S.

[0222] This conventional adder comprises three cells 48, 50, 52.

[0223] The cell 48 is connected to the output of first means (not shown)for storage of the bits A₀ and B₀ and to the input of second means (notshown) for storing the bit S₀. It is also connected to an input of thecell 50. This call 48 is adapted to add the bits A₀ and B₀ in aconventional manner and to transmit the carry digit of this addition tothe cell 50. The result of this addition is stored in the second meansfor storage of the bit S₀.

[0224] The cell 50 is connected to the output of first means (not shown)for storage of the bits A₁ and B₁ and to the input of second means (notshown) for storage of the bit S₁. It is also connected to an input ofthe cell 52. This cell 50 is adapted to add the bits A₁ and B₁ in aconventional manner and to transmit the carry digit of this addition tothe cell 52. The result of this addition is stored in the second meansfor storage of the bit S₁.

[0225] The cell 52 is connected to the output of first means (not shown)for storage of the bits A₂ and B₂ and to the input of second means (notshown) for storage of the bits S₂ and S₃. This cell 52 is adapted to addthe bits A₂ and B₂, the result and the carry digit of this additionbeing stored in the second means for storage, in the bits S₂ and S₃respectively.

[0226] The time required to perform the calculation S₀ by the cell 48 isdenoted λ and it is assumed that the time required to perform thecalculation of S₁ and of S₂, S₃ respectively by the cells 50 and 52 isidentical to that of the cell 48. It will be noted upon reading thedescription of this conventional adder that the performance of thecalculation of S₁ by the cell 50 can only commence when the cell 48 hastransmitted the carry digit of the addition of the bits A₀ and B₀, thatis to say when the calculation of S₀ is terminated. Likewise, theperformance of the calculation of S₂, S₃ by the cell 52 can onlycommence when the cell 50 has finished the calculation of S₁.Consequently, the addition of the two input variables A, B by the adderof FIG. 3B necessitates a time to perform it of 3 λ.

[0227] Therefore it will be appreciated that in order to add three inputvariables A, B and D with the aid of the conventional adder of FIG. 3B,the time required to perform the calculation is 3 λ for a first additionof A to B to which it is appropriate to add 3 λ, corresponding to thetime required to perform a second addition between the result of thefirst addition and the variable D. Thus to carry out an addition betweenthree input variable A, B and D with the aid of this conventional addernecessitates a time of 6 λ.

[0228] With the aid of this simplified example it is established thatthe time required to perform a conventional addition is proportional tothe numbers of bits of the input variables.

[0229] By way of comparison it may be assumed that the performance timeλ is the same for the cells 40, 42, 44, 48, 50 and 52 of FIGS. 3A and3B. Thus a carry-save addition between the variables A, B and D with theaid of the carry-save adder is performed in a time λ. In order to obtainthe results of the addition in one single variable, the variables C andS must be recombined by carrying out a conventional addition operationbetween them which is performed in a time 3 λ. The total time requiredto perform the addition of the variables A, B and D using a carry-saveadder is then equal to 4 λ, as against 6 λ in the case where onlyconventional adders are used.

[0230] It will also be appreciated upon reading the precedingdescription that the gain in time achieved by virtue of the use ofcarry-save adders is all the more substantial as the additions arecarried out on large numbers. In fact, the time required to perform aconventional addition is proportional to the number of bits of the inputvariables, which is not the case for a carry-save addition.

[0231] However, it is known that the use of carry-save adders is onlyuseful in order to carry out additions between three input variables.Moreover, the result obtained at the output of a carry-save adder ispresented in the form of a carry-save ordered pair which necessitatesrecombination of the output variables C and S by a conventionaladdition, thus limiting the usefulness of a carry-save adder. It hasalso been appreciated that it is difficult to carry out arithmeticoperations on a variable represented in the form of a carry-save orderedpair. For example it is not possible simply to carry out an operation ofdivision by a power of 2, denoted 2^(ω), of a carry-save ordered pairaccording to the following relationship:

(C, S)/2^(ω):=(C/2^(ω) , S/2^(ω))

[0232] where C and S are the variables of the carry-save ordered pair.

[0233] This difficulty is illustrated on the example of FIG. 4 where:

C=0110 0000 0010; and

S=01001001 1110.

[0234] By recombination of the variables C and S according to therelationship C+S the following result is obtained:

C+S=10101010 0000(=680 dec).

[0235] By division of the recombined carry-save ordered pair C+S by apower of 2, in this case 16, the following result is then obtained:

(C+S)/16=10101010(=170 dec).

[0236] Now, if the same calculation is carried out but with the order ofthe operations reversed, that is to say that first of all the divisionoperation and then the recombination operation is carried out, then thefollowing numerical results are obtained in succession:

C/16=0110 0000;

S/16=0100 1001;

C/16+S/16=1010 1001(=169 dec).

[0237] It will therefore be noted that the simple division of eachvariable C and S by a power of two does not permit the exact result tobe obtained. It is therefore necessary to recombine the carry-saveordered pair (C, S) before performing a division of a variable stored inthe form of a carry-save ordered pair. No known solution to this problemexists in the current prior art.

[0238] Upon reading the known drawbacks of the carry-save adders, itwill be appreciated that it is not obvious to use these adders withinthe framework of the calculation of a Montgomery product. In fact, theknown methods of calculation of a Montgomery product only involveaddition operations between two variables and not three. Furthermore,these known methods include, particularly in the case of the high-radixmethod, arithmetic operations which cannot be carried out on carry-savepairs, such as the operation 30 of FIG. 2.

[0239]FIG. 5 shows a method according to the invention for calculationof a Montgomery product between two input variables, denoted {overscore(a)} and {overscore (b)}, corresponding to the remainders calculatedduring the steps 4 and 6 of the method of FIG. 1. In order to presentthis method, the same notations are used as those defined with regard toFIG. 2.

[0240]FIG. 5 comprises three successive principal steps 70, 72 and 74,the step 70 being a step of initialisation, the step 72 being a step ofiteration of a loop of operations, and the step 74 being a step ofrecombination and reduction of the result.

[0241] The initialisation step 70 consists of initialising the variablesnecessary for the calculation of the Montgomery product according to thefollowing relationships:

C1:=0;

S1:=0;

C2:=0;

S2:=0;

R:=0;

[0242] where:

[0243] C1 and S1 are variables of a first carry-save ordered pairdenoted (C1, S1);

[0244] C2 and S2 are variables of a second carry-save ordered pairdenoted (C2, S2);

[0245] R is a variable for storage and cumulative totalling of carrydigits, the significance of which will become apparent upon reading thefollowing description.

[0246] The step 70 also consists of pre-calculating the first products{overscore (a)}_(i).{overscore (b)} defined with regard to the operation24 of FIG. 2.

[0247] For this, {overscore (b)} is multiplied by all the possiblevalues of {overscore (a)}_(i), that is to say the natural integersbetween 0 and 2^(ω)−1.

[0248] The second step 72 consists of reiterating a loop of operationsas long as an index, denoted i, is not greater than or equal to avariable s−1, the index i being incremented at the end of each iterationof the loop. This loop of operations is denoted in a conventional manner“for i=0 to 2−1”. The variable s which determines the number ofiterations is defined in an analogous manner to that of step 18 of FIG.2.

[0249] The loop of operations 72 comprises four successive operations76, 78, 80 and 82.

[0250] The operation 76 consists of carrying out a first operation ofcarry-save addition between the variables C2 divided by 2^(ω), S2divided by 2^(ω) and one of the first products {overscore(a)}_(i).{overscore (b)} defined with regard to the operation 24 of FIG.2. This addition operation is carried out with the aid of a carry-saveadder according to the following relationship:

(C1, S1):=C2/2^(ω) +S2/2^(ω) +{overscore (a)} _(i) .{overscore (b)}

[0251] where:

[0252] ω is the radix;

[0253] (C1, S1) is the first carry-save ordered pair formed by thevariables C1 and D1;

[0254] {overscore (a)}_(i).{overscore (b)} is one of the first products;

[0255] C2 and S2 are the variables of the second carry-save ordered pair(C2, S2).

[0256] It will be noted that this operation 76 fulfils the same functionas the operations 24 and 30 of FIG. 2, but the first addition operationis carried out with the aid of a carry-save adder.

[0257] The operation 78 consists of carrying out the conventionaladdition of the variables C1₀, S1₀ and (R/2^(ω))₀ and then allocatingthe result of this operation to a variable m, according to the followingrelationship:

m:=(C1₀ +S1₀+(R/2^(ω))₀). n′₀

[0258] where:

[0259] C1₀ and S1₀ represent the ω least significant bits respectivelyof the variables C1 and S1, ω being the radix;

[0260] (R/2^(ω))₀ represents the ω least significant bits of the resultof the division of R by 2^(ω), ω being the radix;

[0261] n′₀ is the variable calculated during step 2 of the method ofFIG. 1;

[0262] m is a variable in which the result is stored.

[0263] The operation 80 consists of carrying out a second operation ofaddition between the variables C1, S1 and one of the second products m.ndefined with regard to the operation 28 of FIG. 2. This addition iscarried out by a carry-save adder and the result is allocated to thevariables C2, S2 of the second carry-save ordered pair according to thefollowing relationship:

(C2,S2):=C1+S1+m.n

[0264] where:

[0265] C1 and S1 are the variables previously calculated;

[0266] m.n is one of the second products;

[0267] S2 and C2 are the variables of the second carry-save orderedpair.

[0268] It will be noted that the operation 80 fulfils the same functionas the second addition operation of FIG. 2, but it is carried out withthe aid of a carry-save adder.

[0269] The operation 82 consists of calculating the variable R by addingthe variables C2₀, S2₀, and the value of the variable R in aconventional manner. The result is allocated to the variable R accordingto the following relationship:

R:=C2₀ +S2₀ +R

[0270] where:

[0271] C2₀, S2₀ are respectively the ω least significant bits of thevariables C2 and S2, ω) being the radix;

[0272] R is the variable for storage and cumulative totalling of thecarry digits.

[0273] In fact, it has been discovered that the difference in resultbetween the operation (C2+S2)/2^(ω) and the operation(C2/2^(ω)+S2/2^(ω)), as illustrated by the example of FIG. 4, is equalto the carry digit of the operation C2₀+S2₀. Therefore the carry digitof the operation C2₀+S2₀ is here called “the carry digit which is atrisk of being lost by the division of each variable C2 and S2 by a powerof 2, denoted 2^(ω). Therefore this operation 82 calculates the carrydigit which is at risk of being lost by the division of each variable C2and S2 of the second carry-save ordered pair by the power 2^(ω) duringthe operation 76. Furthermore, here the operation 82 cumulatively totalsthe carry digit of the addition of C2₀+S2₀ at each iteration of the loopof operations 72 for subsequent use in the step 74.

[0274] The step 74 of recombination and reduction is made up of arecombination operation 84 followed by a reduction operation 86.

[0275] The operation 84 consists of carrying out a conventional additionbetween the variable C2 divided by 2^(ω), the variable S2 divided by2^(ω) and the variable R divided by 2^(ω), the result being allocated toa variable u according to the following relationship:

u:=C2/2^(ω) +S2/2^(ω) +R/2^(ω)

[0276] where:

[0277] ω is the radix;

[0278] C2, S2 and R are the variables previously calculated during theloop of operations 72;

[0279] u is a variable of storage of the result of the operation.

[0280] It will be noted that this operation is a combination of thefollowing operations:

[0281] A division by 2^(ω) of each variable of the carry-save orderedpair (C2, S2).

[0282] An operation of extraction from the cumulative totalling of thecarry digits calculated during the execution of the loop of operations72, this operation being carried out by shifting the variable R to theright by ω bits.

[0283] An operation of recombination of the second carry-save orderedpair (C2, S2) calculated during the execution of the loop of operations72.

[0284] An operation of addition to the previously recombined secondcarry-save ordered pair of the cumulative total of the carry digitswhich would have been lost if they had not been stored and cumulativelytotalled in the variable R during the execution of the loop ofoperations 72. Thus this operation makes it possible to restore the truevalue of the result at the end of the loop of operations 72 in spite ofthe operations of division of each variable of a carry-save orderedpair.

[0285] The operation 86 consists of carrying out a reduction operationif the variable u is greater than the modulus n according to thefollowing relationship:

u:=u−n

[0286] where u is the result of the Montgomery product.

[0287] This operation is denoted in a conventional manner: “if u≧n thenu:=u−n”.

[0288] The method of calculation of a Montgomery product according tothe invention is clearly faster than the known method of FIG. 2. Infact, the first and second addition operations 76 and 80 are carried outwith the aid of carry-save adders, whilst in the known method the firstand second addition operations 24 and 28 are carried out with the aid ofat least one conventional adder. Furthermore, the method of FIG. 5discloses a method of carrying out a division of a variable representedin the form of a carry-save ordered pair by a power of 2, which avoids astep of recombination of the carry-save ordered pair before performingthis division. This speeding up of the time required to perform theMontgomery product calculation is all the more substantial as the inputvariables {overscore (a)}, {overscore (b)} are larger, i.e. encoded on asubstantial number of bits (greater than 320 bits).

[0289] It will be noted that the operations 78 and 82 include additionson small numbers encoded on ω bits and that an optimisation of the timerequired to perform these two operations has no significant effect.

[0290] Moreover, the operations 84 and 86 are carried out lessfrequently than the operations of the loop 72, and consequently anoptimisation of the time required to perform them, whilst possible, haso more effect than that of the operations of the loop 72. However, in avariant these operations are speeded up. An embodiment of this variantwill be presented with regard to FIG. 9.

[0291] In another variant, all of the second products m.n are calculatedbefore the loop of operations 72 is executed and are stored in a memory.Thus the operations of calculating the first products {overscore(a)}_(i).{overscore (b)} and the second products m.n during the loop ofoperations 72 are replaced by operations of selection of the results ofthese calculations in the said memory.

[0292] In a variant, the radix ω is chosen to be equal to 4 bits in sucha way as to optimise the time required to perform the calculation of theMontgomery product between input variables encoded on 512 or 1024 bitson computing hardware. In fact, it has been determined in a mannersimilar to that described with regard to the method of FIG. 2 that forsuch input variables a value of the radix ω equal to 4 bits speeds upthe time required to perform the Montgomery product calculation.

[0293] The embodiment will preferably be a combination of the method ofFIG. 5 and the two variants described above.

[0294]FIG. 6 shows a method of calculation of a modular exponentiationaccording to the m-ary method in order to carry out the followingcalculation:

M^(E) mod n

[0295] where:

[0296] M, E and n are natural integers encoded in binary form on amaximum of k bits,

[0297] M is the message; E is the exponent; and n is the modulus.

[0298] The m-ary method of calculating a modular exponentiation isknown, and therefore the description which follows only has the aim ofintroducing the elements necessary for an understanding of theinvention. The reader may refer to document D1, chapter 2.4 “The maryMethod” for more detailed information.

[0299]FIG. 6 includes four successive steps 90, 92, 94 and 96.

[0300] The step 90 consists of calculating and registering in a memorythe following exponentiations of the variable M:

M^(α) mod n;

[0301] where:

[0302] M is the message;

[0303] α is an exponent;

[0304] n is the modulus.

[0305] The preceding exponentiation is calculated for all the values ofthe exponent a between 2 and m−1, m being equal to 2^(r), where r is aparameter pre-defined by the user. This step is represented in aconventional manner in FIG. 6 by the caption “M^(α) mod n for all α=2,4, . . . m−1”.

[0306] The step 92 consists of cutting the binary representation of theexponent E into s′ r-bit words, each denoted F_(i), where i is an indexof the word and varies from 0 for the word furthest to the right in thebinary representation of E to s′−1 for the word furthest to the left ofthis same binary representation. s′ is calculated according to thefollowing relationship:

k=s′.r

[0307] where:

[0308] k is the number of bits of the binary representation of E;

[0309] r is the pre-defined parameter.

[0310] If k is not divisible by r, bits equal to 0 are added to the leftof the binary representation of the exponent E in order to obtain abinary representation including a number of bits divisible by theparameter r. For example, if r and k are respectively equal to 5 and 512bits then 3 bits of zero value are added to the left of the binaryrepresentation of the exponent E in order to obtain a binaryrepresentation including 515 bits, which makes it possible to obtain s′equal to 103.

[0311] The different words Fi are obtained, for example, by successiveoperations of shifting to the left of the exponent E of r bits in ashift left register.

[0312] The step 94 consists of calculating M^(F) ^(_(s′−1)) mod n andallocating the result to a variable C according to the followingrelationship:

C:=M ^(F) ^(_(s′−1)) mod n;

[0313] where:

[0314] n is the modulus;

[0315] F_(s′−1) is the (s′−1)^(th) word determined during the step 92;

[0316] M is the message;

[0317] C is the variable in which the result of the operation 94 isstored.

[0318] The step 96 consists of reiterating a loop of operations as longas the index i initialised at the value of s′−2 is not less than orequal to 0, the index i being decremented at the end of each iterationof the loop. This loop of operations is denoted in a conventional manner“for i=s′−2 downto 0”. The variable s′ which determines the number ofiterations has been defined previously.

[0319] This loop of operations comprises two successive operations 98,100.

[0320] The operation 98 consists of calculating a modular exponentiationof the variable C and then allocating the result to the variable Caccording to the following relationship:

C:=C^(2′) mod n

[0321] where:

[0322] C is the variable initialised during the step 94;

[0323] r is the pre-defined parameter;

[0324] n is the modulus.

[0325] The operation 100 consists of calculating a modularmultiplication of the variable C, previously obtained during theoperation 98, by the variable M^(F) ^(_(s′−1)) if the word F_(i) isdifferent from 0 according to the following relationship:

C:=C.M^(F) ^(_(i)) mod n

[0326] where:

[0327] n is the modulus;

[0328] F_(i) is the word of index i determined during the step 92;

[0329] C is the variable previously calculated during the operation 98.

[0330] This operation is represented in a conventional manner in FIG. 6by the caption “If F_(i)≠0 Then C:=C. M^(F) ^(_(i)) mod n”.

[0331] At the end of the execution of the loop of operations 96, thevariable C contains the result of the modular exponentiation of themessage M.

[0332] The m-ary method described above for calculating a modularexponentiation implements approximately δ operations of modularmultiplication, δ being calculated by the following relationship:

δ=2^(r)−2+k−r+(k/r−1)(1−1/2^(r))

[0333] where:

[0334] k is the number of bits of the exponent E;

[0335] r is the pre-defined parameter.

[0336] This represents a reduction in the number of operations bycomparison with other known methods, such as the LR binary algorithm, of17 to 18% when the exponentiation relates to large numbers encoded on512 or 1024 bits. However, certain methods are known to be even faster,such as for example the RL binary algorithm which permits paralleloperations. However, it has been determined experimentally that them-ary method for a parameter r chosen to be equal to 5 bits is anoptimum compromise between the number of modular multiplicationoperations carried out and the resources necessary in order to implementthis method. “Resources” is intended to mean for example the number ofcells of a FPGA component.

[0337]FIG. 7 shows a method of calculation of a modular exponentiationaccording to the invention which is illustrated in the case of thecalculation of the following exponentiation:

M^(E) mod n

[0338] where:

[0339] M, E are natural integers encoded in binary form on a maximum of512 bits;

[0340] M is the message;

[0341] E is the exponent; and

[0342] n is the modulus.

[0343] The method of modular exponentiation according to the inventionimplements the m-ary method in which the modular multiplications arecarried out according to the Montgomery method described with regard toFIG. 1. The Montgomery products are for example calculated according tothe method of FIG. 5 with a radix equal to 4 bits. Furthermore, in theparticular case described here the parameter r of the m-ary method ischosen to be equal to 5 bits in such a way as to speed up the timerequired to perform the calculation of the exponentiation for inputvariables encoded on 512 or 1024 bits.

[0344] This method comprises seven successive steps 110, 112, 114, 116,118, 120 and 122.

[0345] The step 110 consists of calculating the Montgomery remainder ofthe message M according to the following relationship:

{overscore (M)}:=M.p mod n

[0346] where:

[0347] M is the message;

[0348] p is the parameter of the Montgomery method defined during thestep 4 of the method of FIG. 1 according to the following relationship:p=2^(k), where k is the number of bits of the modulus n;

[0349] n is the modulus;

[0350] {overscore (M)} is the variable in which the remainder of themessage M is registered.

[0351] The calculation of the remainder of M is carried out byconventional methods such as the extended Euclidean algorithm.

[0352] The step 112 consists of calculating the variable n′₀ accordingto the following relationship: n′₀=−n′₀ ⁻¹. This calculation has alreadybeen described with regard to step 2 of FIG. 1 and therefore it will notbe described again here in detail. This calculation is also carried outby conventional methods such as the extended Euclidean algorithm.

[0353] The step 114 consists of calculating all of the second productsm.n. For this the product m.n is calculated for each value of m between0 and 15. In fact, an examination of the operation 26 of FIG. 2 showsthat m is congruent with u₀. n′₀ modulo 2^(ω), such that the value of mcan only be between 0 and 15 when the radix ω is equal to 4 bits.

[0354] The step 116 consists of raising the remainder {overscore (M)} inthe Montgomery sense to the power a for all the different values of αbetween 2 and 31. In fact, the parameter r of the m-ary method is equalto 5 bits here, and it follows from the step 90 of the method of FIG. 6that it is not necessary to calculate the powers {overscore (M)} higherthan 31. This step 116 is for example carried out by thirty-onesuccessive Montgomery product calculations according to the followingrelationship:

{overscore (M)} ^(α) =MonPro({overscore (M)},{overscore (M)} ^(α−1))

[0355] where MonPro designates a Montgomery product calculated forexample according to the method of FIG. 5.

[0356] During this step, the following operations are carried out insuccession:

[0357] {overscore (M)}²=MonPro ({overscore (M)}, {overscore (M)}), where{overscore (M)} has been calculated during the step 110;

[0358] {overscore (M)}³=MonPro ({overscore (M)},{overscore (M)}²), where{overscore (M)}² has been calculated during the preceding operation;

[0359] etc . . . .

[0360] Thus {overscore (M)}² to {overscore (M)}³¹ are obtainedsuccessively.

[0361] The step 118 consists of cutting the exponent E into a successionof 5-bit words called F_(i) in accordance with the step 92 of the m-arymethod described with regard to FIG. 6. Then, still in step 118, thevalue of {overscore (M)}^(F) ^(₁₀₂) is allocated to a variable Caccording to the following relationship:

{overscore (C)}:={overscore (M)}^(F) ^(₁₀₂)

[0362] where F₁₀₂ is the 102^(nd) word F_(i) as defined with regard tothe step 94 of FIG. 6.

[0363] It will be noted that during this step {overscore (M)}^(F) ^(₁₀₂)does not have to be calculated since this calculation has already beencarried out during the step 116.

[0364] The step 120 consists of reiterating a loop of operations as longas an index i initialised at the value 101 is not strictly less than 0,the index i being decremented by 1 with each iteration of the loop ofoperations. The initial value of the index i is calculated in accordancewith the step 96 of FIG. 6 for a parameter r of the m-ary method equalto 5 bits and a value of the variable k equal to 515 bits.

[0365] The loop of operations is made up of two successive operations126 and 128.

[0366] The operation 126 consists of calculating and storing the raisingto the power 32 of the variable {overscore (C)} according to thefollowing relationship:

{overscore (C)}:={overscore (C)}³²

[0367] where:

[0368] {overscore (C)} is the variable initialised at step 118;

[0369] 32 is calculated in accordance with the operation 98 of the m-arymethod of FIG. 6, according to the relationship 32=2⁵, where 5 is thevalue of the parameter r of the m-ary method.

[0370] The operation 128 consists of calculating the Montgomery productof the variable {overscore (C)} by the variable {overscore (M)}^(F)^(_(i)) and storing this result according to the following relationship:

{overscore (C)}:=MonPro({overscore (C)},{overscore (M)} ^(F) ^(_(i)) )

[0371] where:

[0372] {overscore (M)}^(F) ^(_(i)) is selected from amongst the powersof {overscore (M)} calculated at the step 116 knowing the value ofF_(i);

[0373] MonPro designates the Montgomery product operation, for exampleperformed in accordance with the method of FIG. 5.

[0374] It will be noted that this operation 128 also includes a test ofthe value of F_(i) in such a way as to perform a Montgomery productcalculation if the value of F_(i) is different from 0.

[0375] In a variant, the Montgomery product calculation issystematically performed in order to avoid the test of the value ofF_(i).

[0376] At the end of the step 120, the step 122 is performed. This stepconsists of calculating the Montgomery product between the variable{overscore (C)} and the unit 1 and storing this result according to thefollowing relationship:

C:=MonPro({overscore (C)}, 1)

[0377] where:

[0378] {overscore (C)} is the variable calculated at step 120;

[0379] 1 represents the unit;

[0380] C is a variable in which the result of the modular exponentiationof the input message M is registered.

[0381] It will be noted that the combination of the m-ary method and theMontgomery method in order to calculate modular multiplications is ofparticular interest in the case of the calculation of an exponentiationsince the Montgomery remainder of the input message M is only calculatedonce. Thus the drawback of the Montgomery method, that is to say thenecessity of calculating the remainders of input variables beforecarrying out the Montgomery product calculations is limited. Thiscombination of the m-ary method and the Montgomery method thereforemakes it possible to speed up the time required to perform thecalculation of a modular exponentiation.

[0382] In a variant it is also possible to combine the method of FIG. 7with the Chinese remainders method (also called the CRT method). TheChinese remainders method is succinctly described in FIG. 8. This methodis known, and the reader may refer for more detail to chapter 4.1: “FastDecryption using CRT” of the document D1.

[0383] The Chinese remainders method makes it possible to break down afirst modular exponentiation operation into two second modularexponentiation operations with smaller exponents and moduli.

[0384] The first modular exponentiation is denoted as follows:

M^(E) mod n

[0385] where:

[0386] M is an input message;

[0387] E is an exponent;

[0388] n is a modulus which is broken down in the form of a product suchthat n=P.Q, where P and Q are first natural integers.

[0389] In a first step 130, this first exponentiation is broken downinto two second exponentiations respectively module E1 and E2 which arecalculated separately according to the following relationships:

M1:=M^(E1) mod P

M2:=M^(E2) mod Q

[0390] where:

[0391] M is the input message;

[0392] E1=E mod (P−1);

[0393] E2=E mod (Q−1);

[0394] M1 and M2 are variables for storage of the intermediate results.

[0395] In a following step 134, the result of the first modularexponentiation is obtained by combining the previously calculatedvariables M1 and M2 according to the following relationship:

M:=M2+[(M1−M2).(Q ⁻¹ mod P)mod P].Q

[0396] where:

[0397] M1 and M2 are the variables calculated at step 130;

[0398] Q and P are the first numbers such that n=P.Q.

[0399] As k is the number of bits necessary in order to encode themodulus n, it is possible to choose P and Q such that P and Q have anumber of bits substantially equal to k/2. In these conditions, it isconsidered that the Chinese remainders method makes it possible toreduce by a factor 4 the number of operations required in order tocalculate the first exponentiation, when this latter is implemented bycomputing software. This factor is of the order of 2 when the Chineseremainders method is implemented by computing hardware such as a FPGAcomponent. Furthermore, in order to speed up the time required toperform the calculation of the first exponentiation, the calculations ofthe variables M1 and M2 can be effected in parallel.

[0400] It will be noted that this method thus makes it possible to breakdown a first modular exponentiation concerning large numbers encoded on1024 bits into two second modular exponentiations concerning largenumbers encoded on 512 bits.

[0401] Estimations of the time required for calculation of a firstmodular exponentiation have been made in the following conditions:

[0402] the first modular exponentiation concerning large numbers of 1024bits is broken down into two second modular exponentiations each of 512bits;

[0403] each of the second modular exponentiations is calculatedaccording to the method of FIG. 7 in which the Montgomery products arecalculated according to the method of FIG. 5.

[0404] In these conditions when the method is implemented by a FPGAcomponent working at 40 MHz, the time required to perform thecalculation of the first modular exponentiation is substantially equalto 4.71 milliseconds.

[0405] In the same conditions but for large numbers encoded on 102 bitsit has been determined that the time required to perform the calculationof a first exponentiation is substantially equal to 17.8 milliseconds.

[0406]FIG. 9 is a schematic representation of computing hardware 150according to the invention. This hardware is called here a “Montgomerymultiplier”. In this Figure only the elements specific to the inventionhave been shown. The other components which are not shown but arenecessary to the implementation of the method of FIG. 5 may be easilydetermined in a conventional manner on the basis of the elementsdescribed previously. Thus the components necessary in order toimplement the operations 78 and 82 of FIG. 5 as well as the divisionoperations have not been shown. Equally, the storage buffers for thevariables C1, S1, C2, S2, R and u are not shown.

[0407] This multiplier 150 includes a memory 152 connected to the inputand the output of specific computing means 154 under the control ofcontrol means 156.

[0408] The Montgomery multiplier 150 described here by way of example isadapted to co-operate with the principal computing means (not shown).These principal computing means perform for example a modularexponentiation according to the method of FIG. 7. In such a situationthe Montgomery multiplier 150 is a coprocessor which makes it possibleto speed up the time required to perform the Montgomery productcalculations.

[0409] The memory 152 is connected by means of the data input/output busto the principal computing means (not shown).

[0410] The memory 152 is adapted to store the following variables:

[0411] the variable {overscore (M)} calculated during the step 110 ofthe method of FIG. 7;

[0412] the variable n′₀ calculated during the step 112 of the method ofFIG. 7;

[0413] the second products m.n calculated during the step 114 of themethod of FIG. 7;

[0414] the variables {overscore (M)}^(α) a calculated during the step116 of FIG. 7;

[0415] the variable {overscore (C)} initialised during the step 118 andcalculated during the operations 126 and 128 of the method of FIG. 7;

[0416] the unit 1 necessary for carrying out the step 122 of the methodof FIG. 7; and

[0417] the first products {overscore (a)}_(i).{overscore (b)}pre-calculated during the step 70 of the method of FIG. 5.

[0418] The specific computing means 154 include a first and a secondcarry-save adder 157, 158, a first and a second conventional adder 160and 162, a shift right register 164 and a conventional subtractor 166.

[0419] The first carry-save adder 157 is connected to an output of thememory 152 and to an output of the second carry-save adder 158. It isalso connected to the input of the second carry-save adder 158. Thiscarry-save adder is intended here to carry out the first additionoperation 76 of the method of FIG. 5. Its structure is conventional andfollows from that described with regard to FIG. 3A.

[0420] The second carry-save adder 158 is connected to the output of thememory 152 and to an output of the first carry-save adder 157. It isalso connected to an input of the first carry-save adder 157. This adder158 is intended here to carry out the second addition operation 80 ofthe method of FIG. 5. Its structure is similar to that of the firstcarry-save adder 157.

[0421] The first conventional adder 160 is connected to an input and tothe output of the memory 152. This adder is intended to carry out thepre-calculation of the first products {overscore (a)}_(i).{overscore(b)} and the second products m.n. For example, the calculation of thesecond products m.n is carried out according to the following successionof calculations:

2.N:=N+N

3.N:=N +2.N

4.N:=N +3.N

[0422] etc . . . .

[0423] The results of the calculations of the first and the secondproducts are then stored in the memory 152 and the locations providedfor that purpose.

[0424] The second conventional adder 162 is connected to the output ofthe second carry-save adder 158 and to an input of the subtractor 166.This second adder 162 is intended to carry out the recombinationoperation of FIG. 5. Its structure follows from that described withregard to FIG. 3B. However, the cells which make it up, such as the cell48 of FIG. 3B, are grouped in stages of 32 cells. The output of eachstage is directly connected to a corresponding stage in the subtractor166 in such a way that as soon as the calculation of the addition in oneof the stages is finished the result is directly transmitted to thecorresponding stage of the subtractor 166 without waiting. Thus thesubtractor 166 performs the subtraction operation with only one clockcycle delay on the addition operation. This structure is known under thename “pipe line”, and makes it possible to speed up the time required toperform operations.

[0425] The subtractor 166 is adapted to carry out the operation 86 ofFIG. 5. Therefore for example it is connected to the outputs of thesecond conventional adder 162 and of the memory 152. It is alsoconnected to an input of the memory 152 for example in order to storethe result of the reduction operation 86.

[0426] The shift right register 164 is adapted to shift to the right byω bits, ω being the radix of the high-radix Montgomery method. Thisregister 164 is intended to carry out the operations of calculating the{overscore (a)}_(i) , the result then being used in order to select oneof the corresponding first products {overscore (a)}_(i).{overscore (b)}in the memory 152. The connections of the shift register 164 to theother components of FIG. 9 have not been shown in order to simplify theschematic representation, but such connections can be easily determined.

[0427] The control means 156 are adapted to control the operation of thespecific computing means 154 and of the memory 152 in accordance withthe method of FIG. 5. These control means are designed in a conventionalmanner.

[0428] All of the elements in FIG. 9 are, for example, implanted in aFPGA component or in a ASIC component. In a variant this component isassociated with other electronic components on an electronic card insuch a way as to produce an electric card conforming to the PCIstandard. A card conforming to the PCI standard can be slotted intostandard computers, and these latter are then adapted to form theprincipal computing means.

[0429] In the case of a FPGA component with the reference XILINXXCV1600E-6 operating at 45 MHz, the estimates of the number of clockcycles required in order to perform each step of the method of FIG. 5are as follows:

[0430] 35 clock cycles for the step 70;

[0431] 260 clock cycles for the step 72;

[0432] 39 clock cycles for the step 74 of recombination and reduction.

[0433] Thus the estimate of the total number of clock cycles in order tocalculate a Montgomery product according to the method of FIG. 5 is 334clock cycles for the input variables encoded on 512 bits.

[0434] In these conditions it has also been estimated that the method ofFIG. 7 implements 643 Montgomery products and that the step 114 of FIG.7 of pre-calculation of the second products m.n necessitates 38 clockcycles. Thus an estimate is obtained of the number of clock cyclesnecessary in order to calculate a modular exponentiation concerninglarge numbers of 512 bits equal to 214223 clock cycles. For an operatingfrequency of the FPGA component of 45 MHz this corresponds to a numberof 512 bit exponentiations substantially higher than 200 per second. Itwill be noted that for this estimate it is considered that the steps 110and 112 of the method of FIG. 7 are performed by the principal computingmeans associated with the Montgomery multiplier 150. Consequently thenumber of clock cycles required in order to execute these two operationsis not taken into account in this estimate. However, it is admitted thatthe time required to perform them is approximately 10 times less thanthat of steps 114 to 122.

[0435] In a variant the specific computing means 154 comprise one singlecarry-save adder. In fact, when the method of FIG. 5 is being carriedout the first addition operation 76 always precedes the second additionoperation 80 since the result of the first addition 76 is used in thissecond addition operation 80. Consequently the first and the secondcarry-save adders 157, 158 are never active at the same time, and it istherefore possible to replace them by one single carry-save adder whichcarries out the first addition operation 76 and the second additionoperation 80 alternately.

[0436]FIG. 10 is a schematic representation of the computing hardware200 according to the invention associated with principal computingmeans. In this schematic representation only the principal electroniccomponents have been shown, but the other components can be easilydetermined.

[0437] The principal computing means 201 are adapted to perform themodular exponentiations according to the method of FIG. 7 byco-operating with the computing hardware 200. They are, for example,formed with a computer. In the particular case described here, the means201 are adapted to perform a first and a second modular exponentiation.The first and the second modular exponentiations are each carried outaccording to the method of FIG. 7 and

[0438] consequently implement respectively the first and the secondMontgomery products.

[0439] The computing hardware 200 is adapted to form a coprocessor forthe principal computing means 201. It includes a Montgomery multiplier202 associated with means for shifting to the left 204 under the controlof first control means 206.

[0440] The Montgomery multiplier 202 is a variant of the Montgomerymultiplier 150 of FIG. 9 in which the use of the resources is optimised.In fact it is adapted to perform the first and the second Montgomeryproduct calculations substantially in parallel without neverthelesshalving the resources to be implemented. Thus it makes it possible todivide by two the time required to perform two Montgomery productcalculations.

[0441] This Montgomery multiplier 202 includes a memory 210 associatedwith specific computing means 212 under the control of second controlmeans 214. Just as in FIG. 9, only the principal components have beenshown, but the other components can be easily determined.

[0442] The memory 210 is adapted to store the following variables:

[0443] the remainder {overscore (M)} of an input message M of the firstexponentiation, calculated during the step 110 of the method of FIG. 7by the computing means 201.

[0444] the remainder {overscore (M)}′ of an input message M′ of thesecond exponentiation, calculated during the step 110 of the method ofFIG. 7 by the computing means 201.

[0445] the variables n′₀ and n″₀ calculated during the steps 112 of themethod of FIG. 7 respectively for the first and the second modularexponentiations;

[0446] the second products m.n and m′.n′ calculated during the steps 114of the method of FIG. 7 respectively for the first and the secondmodular exponentiations;

[0447] the variables {overscore (M)}′ and {overscore (M)}′ calculatedduring the steps 116 of the method of FIG. 7 respectively for the firstand the second modular exponentiations;

[0448] the variables {overscore (C)} and {overscore (C)}′ calculatedduring the step 118 and during the operations 126 and 128 of the methodof FIG. 7 respectively for the first and the second modularexponentiations;

[0449] the unit 1 necessary in order to perform the step 122 of themethod of FIG. 7;

[0450] the moduli n and n′ respectively of the first and the secondmodular exponentiations.

[0451] The memory 210 includes a first and a second data input buffer insuch a way as to register two different data items simultaneously. Italso has a first and a second data output buffer in such a way as tomake simultaneously available to the specific computing means 212 twodifferent data items, one in each data buffer.

[0452] The specific computing means 212 include a first and a secondshift right register 216, 218, a first and a second conventional adder220, 222, a block of carry-save adders 224 and a block 226 forrecombination and reduction.

[0453] The first shift right register 216 is connected to the first dataoutput buffer of the memory 210 and top the input of the firstconventional adder 220. This first shift register 216 is intended to beused during the operations of calculating the first modularexponentiation. Thus this register is used in a similar manner to theregister 164 of FIG. 8 in order to calculate the {overscore (a)}_(i).

[0454] The second shift register 218 is similar to the first shiftregister 216. However, this latter is connected to the second dataoutput buffer of the memory 210 and to the input of the secondconventional adder 222. This shift register is intended to be usedduring the operations of calculating the second modular exponentiation.

[0455] The first conventional adder 220 is connected to the first datainput buffer of the memory 210. This conventional adder 220 is intendedto be used for calculating the first modular exponentiation. Itsstructure and its operation are similar to those of the conventionaladder 160 of FIG. 8.

[0456] The second conventional adder 220 is connected at the output ofthe second shift register 118 and to the second input buffer of thememory 210. Its structure and its operation are similar to those of theconventional adder 160 of FIG. 8.

[0457] The block 224 of carry-save adders is connected to the first andthe second data output buffers of the memory 210, and to the input ofthe recombination and reduction block 226. This block 224 comprises twocarry-save adders 230 and 232. The first and the second carry-saveadders 230, 232 are respectively adapted to carry out the first additionoperation 76 and the second addition operation 80 of the method of FIG.5. These two carry-save adders 230, 232 are controlled by the secondcontrol means 214 so that the operations of calculating the first andthe second Montgomery products are interlaced. Thus after aninitialisation phase the first addition operation 76 for the firstMontgomery product is performed by the first carry-save adder 230 whilstat the same time the second addition operation 80 for the secondMontgomery product is performed by the second carry-save adder 232. Thenduring the following operations of executing the loop of operations 72,the situation is reversed, that is to say that the carry-save adder 230performs the first addition operation 76 for the calculation of thesecond Montgomery product whilst at the same time the second carry-saveadder 232 performs the second addition operation 80 for the calculationof the first Montgomery product. The second control means 214 takeadvantage of the fact that in the method of FIG. 5 applied to thecalculation of one single Montgomery product the first and the secondaddition operations are always successive and cannot be carried out atthe same time. Consequently during the calculation of a singleMontgomery product there is always a carry-save adder which is inactive.Thus the second control means described here control the inactivecarry-save adder in order to perform an addition operation intended fora second Montgomery product performed in parallel with the first.

[0458] The recombination and reduction block 226 is made up of aconventional adder 236 connected to the input of a conventionalsubtractor 238. The conventional adder 236 is connected to the output ofthe block 224 of carry-save adders. This conventional adder 236 isadapted to carry out the recombination operation 84 of the method ofFIG. 5.

[0459] The subtractor 238 is connected for example to the input of theprincipal computing means 201 capable of using the result of theMontgomery product. The subtractor 238 is adapted to carry out thereduction operation 86 of the method of FIG. 5.

[0460] The second control means 214 are provided in a conventionalmanner and are connected to all of the components of the Montgomerymultiplier 202. They are also adapted to control the differentoperations of calculating the first and the second Montgomery productsproduced by the Montgomery multiplier 202.

[0461] The Montgomery multiplier 202 is produced for example with theaid of a FPGA or ASIC component.

[0462] The shift left means 204 are connected to the input and to theoutput of the principal computing means 201 under the control of thefirst control means 206.

[0463] The means 204 for carrying out a shift to the left include amemory 240 of the RAM type (random access memory) in which a first and asecond exponent are stored which correspond respectively to those of thefirst and the second modular exponentiations. The first and the secondexponents are denoted respectively E1 and E2. This memory 240 isconnected to the input of a first and a second r-bit shift left register242, 244, r being the parameter of the m-ary method.

[0464] The shift left register 242 is adapted to determine and supplythe variables F_(i) derived from the exponent E1 in accordance with step118 of the method of FIG. 7. This shift register includes a number ofbits which is lower than that of the exponent E1, for example 32 bitswhereas the exponent E1 is encoded on 512 bits. Thus as soon as all ofthe bits contained in this register have been shifted, the register isimmediately reloaded with the following 32 bits of the exponent E1extracted from the memory 240. This makes it possible to use a 32-bitshift register to shift the numbers encoded on a higher number of bits.

[0465] The shift left register 244 is similar to the shift register 242,but it is intended to supply the variables F′_(i) derived from theexponent E2.

[0466] The first control means 206 are connected to the shift left means204 and to the second control means 214. They are adapted to control theshift left means 204 and the Montgomery multiplier 202 by means of thesecond control means 214. They are also connected to the principalcomputing means 201 and adapted to co-operate with these latter in orderto implement the method of FIG. 7. Thus the steps 110 and 112 of themethod of FIG. 7 are, for example, carried out by the computing means201 whilst the steps 114 to 122 implement the computing hardware 200 tospeed up the calculation time.

[0467] All of the elements of FIG. 10 are for example implanted in aFPGA component or in a ASIC component. In a variant this component isassociated with other electronic components on an electronic card insuch a way as to produce an electronic card which conforms to the PCIstandard. A card which conforms to the PCI standard can be slotted intostandard computers, and these latter are then adapted to form theprincipal computing means.

[0468] In a variant the first modular exponentiation is carried out onthe least significant bits of the input message whilst the secondmodular exponentiation is carried out on the most significant bits ofthis same message, and the results of the exponentiations on the leastsignificant bits and the most significant bits are then recombined inorder to obtain the final result.

[0469] The operation of the components of the computing hardware shownin FIGS. 9 and 10 is conventional per se. The functioning of theco-operation between these different components follows directly fromthe methods described with regard to FIGS. 5 and 7. Consequently theco-operation between the different components will not be described ingreater detail here.

[0470] The operation of the method of FIG. 7 will now be illustratedwith the aid of a simple example consisting of calculating the followingmodular exponentiation:

149¹⁰⁰ mod 165

[0471] where:

[0472] 149 is the value of the input message in decimal, denoted M inthis example;

[0473] 100 is the value of the exponent in decimal, denoted E in thisexample;

[0474] 165 is the value of the modulus in decimal, denoted n in thisexample.

[0475] In the following description of this example, and in order tosimplify the presentation, the Montgomery products are calculatedaccording to the high-radix Montgomery method of FIG. 2 and not by themethod of FIG. 5. The radix is chosen here to be equal to 4 bits.

[0476] Moreover, the parameter r of the m-ary method is chosen here tobe equal to 5 bits.

[0477] The binary representations of M, n and E are as follows:

M=1001 0101(=149 dec)

E=0110 0100(=100 dec)

n=1010 0101(=165 dec).

[0478] It will be deduced from these binary representations that theinput variables are encoded on 8 bits and that consequently theparameter p of the step 110 of FIG. 7 which is necessary in order tocalculate the remainder of M, denoted {overscore (M)}, is equal to 2⁸,that is to say 256. The step 100 of the method of FIG. 7 thereforeconsists of carrying out the following calculation:

{overscore (M)}=149×256mod 165.

[0479] By a conventional method, such as the extended Euclideanalgorithm, this gives: {overscore (M)}=29 dec.

[0480] The step 112 of FIG. 7 consists of calculating n′₀ according tothe relationship defined at step 2 of FIG. 1. For this, first of all n₀is determined, that is to say the 4 least significant bits of themodulus n. n₀ is then equal to 5. Next, n₀ ⁻¹ is calculated with the aidof the following relationship:

n ₀ . n ₀ ⁻¹=1 mod 16.

[0481] In order to calculate the value of the variable n₀ ⁻¹ use is madeof the fact that this value is a natural integer between 0 and 15.Consequently for each possible value of the variable n₀ ⁻¹ the followingproduct is calculated:

n₀.n₀ ⁻¹ mod 16.

[0482] Then the value of n₀ ⁻¹ which satisfies the previously definedrelationship is selected. By this method it is determined that n₀ ⁻¹ isequal to 13.

[0483] Next its complement to 1 is calculated and n′₀=3 is obtained.

[0484] The step 114 of the method of FIG. 7 consists of pre-calculatingthe 16 possible values of the second products m.n. Given the simplicityof the example described here, this will be done not in this step butdirectly at the moment when the value of one of the second products isrequired.

[0485] The step 116 consists of calculating {overscore (M)}^(α) for thesuccessive values of α between 2 and 31. However, in the particularexample described here the exponent E breaks down into only two 5-bitwords F₀ and F₁ of which the values are as follows:

F ₀=00100(=4 dec)

F ₁=00011(=3 dec).

[0486] Consequently only the variables {overscore (M)}³ and {overscore(M)}⁴ are necessary in order to perform the following steps. Thereforeonly the two variables {overscore (M)}³ and {overscore (M)}⁴ will becalculated here.

[0487] In order to calculate {overscore (M)}³ and {overscore (M)}⁴ thefollowing operations are carried out successively:

{overscore (M)} ² =MonPro({overscore (M)},{overscore (M)})

{overscore (M)} ³ =MonPro({overscore (M)},{overscore (M)} ²)

{overscore (M)} ⁴ =MonPro({overscore (M)},{overscore (M)} ³).

[0488] The calculation of these different Montgomery products is carriedout according to the method described with regard to FIG. 2. The methodis identical for the calculation of {overscore (M)}², {overscore (M)}³and {overscore (M)}⁴, and therefore only the calculation of {overscore(M)}² is described below.

[0489] At step 16 of the method of FIG. 2 applied to the calculation of{overscore (M)}², the first products {overscore (M)}_(i). {overscore(M)} are pre-calculated, where the variable {overscore (M)}_(i) takessuccessively the following values:

{overscore (M)} ₀=1101(=13 dec)

{overscore (M)} ₁=0001(=1 dec)

[0490] After calculation,

{overscore (M)}₀.{overscore (M)}=377; and

{overscore (M)}_(i).{overscore (M)}=29

[0491] are obtained.

[0492] The loop 18 of operations of FIG. 2 is then executed successivelyfor the indices i=0 and i=1.

[0493] For i=0, the operations 24 to 30 of the loop 18 are therefore asfollows:

u:={overscore (M)} ₀ .{overscore (M)}=1 0111 1001(=377 dec)

m:=u₀.n′₀ mod 2^(ω)=93 mod 16=11

u:=u+m/n=377+11×165=2192

u:=u/2^(ω)=2192/16=137.

[0494] For the index i=1, the operations 24 to 30 of the loop 18 aretherefore as follows:

u:=u+{overscore (M)} ₁ .{overscore (M)}=137+129=166

m:=u₀.n′₀ mod 2^(ω)=36 mod 16=2

u:=u+m.n=166+2 165=496

u:=u/2^(ω)=496/16=31.

[0495] Therefore {overscore (M)}²=31 is obtained. In a similar manner itis determined that {overscore (M)}³=164; and {overscore (M)}⁴=16.

[0496] It will be noted that {overscore (M)}⁴ at the end of the loop ofoperations 18 is equal to 181, which is higher than the modulus, andconsequently the reduction step 20 must be performed.

[0497] During the operation 118 of the method of FIG. 7, the value ofthe variable {overscore (M)}^(Fs−1), that is to say here {overscore(M)}^(F) ^(_(i)) , is allocated to the variable {overscore (C)}.

[0498] The operations 126 and 128 of the loop of operations 120 of themethod of FIG. 7 are then performed for the value of the index i=0.

[0499] The operation 126 consists of calculating the variable {overscore(C)}³², that is to say here calculating ({overscore (M)}³)³². Thefollowing successive operations are then performed:

{overscore (M)} ⁸ =MonPro({overscore (M)} ⁴ ,{overscore (M)} ⁴)

{overscore (M)} ¹⁶ =MonPro({overscore (M)} ⁸ ,{overscore (M)} ⁸)

{overscore (M)} ³² =MonPro({overscore (M)} ¹⁶ ,{overscore (M)} ¹⁶)

{overscore (M)}⁶⁴ =MonPro({overscore (M)} ³² ,{overscore (M)} ³²)

{overscore (M)} ⁹⁶ =MonPro({overscore (M)} ⁶⁴ ,{overscore (M)}³²)=({overscore (M)} ³)³²

[0500] These Montgomery products are calculated according to the methoddescribed with regard to FIG. 2. The calculations of the variables{overscore (M)}¹⁶, {overscore (M)}³², {overscore (M)}⁶⁴, {overscore(M)}⁹⁶ are similar to that of {overscore (M)}⁸, and therefore they willnot be described in detail here.

[0501] The calculation of {overscore (M)}⁸ is carried out according tothe following relationship:

{overscore (M)} ⁸ =MonPro({overscore (M)} ⁴ , {overscore (M)}⁴)=MonPro(16, 16)

[0502] During the step 16 of the method of FIG. 2, the first twoMontgomery products {overscore (a)}_(i).{overscore (M)}b , that is tosay here {overscore (M)}₀ ⁴{overscore (M)}⁴ and {overscore (M)}₁⁴{overscore (M)}⁴ are pre-calculated. The values of {overscore (M)}₀ ⁴and {overscore (M)}₁ ⁴ are as follows:

{overscore (M)} ₀ ⁴=0000(=0 dec)

{overscore (M)} ₁ ⁴=0001(=1 dec)

[0503] From this the following values of the first products are deduced:

{overscore (M)} ₀ ⁴ .{overscore (M)} ⁴=0×16=0

{overscore (M)} ₁ ⁴ .{overscore (M)} ⁴=1×16=16

[0504] The loop of operations 18 of FIG. 2 is then executed successivelyfor i=0 and i=1.

[0505] For i=0, the operations 24 to 30 of the loop 18 are therefore asfollows:

u:=u+{overscore (a)} _(i) .{overscore (b)}=0

m:=u ₀. n′₀ mod 2^(ω)=0×3 mod 16=0

u:=u+m.n=0+0×165=0

u:=u/2^(ω)=0/16=0

[0506] For i=1, the operations 24 to 30 of the loop 18 are therefore asfollows:

u:=u+{overscore (a)} _(i) .{overscore (b)}=0+16=16

m:=u ₀ . n′ ₀ mod 2^(ω)=0×3 mod 16=0

u:=u+m.n=16+0×165=16

u:=u/2^(ω)=16/16=1

[0507] In a similar manner the following numerical results are obtained:

{overscore (M)}¹⁶=136

{overscore (M)}³²=31

{overscore (M)}⁶⁴=16;

{overscore (M)}⁹⁸=136.

[0508] When the operation 128 of the method of FIG. 7 is being carriedout, F₀ being different from 0, the Montgomery product between thevariable {overscore (C)}³² and {overscore (M)}^(F) ^(₀) is calculatedaccording to the following relationship:

{overscore (C)}:=MonPro({overscore (M)} ⁹⁶ ,{overscore (M)} ⁴)

[0509] where:

[0510] {overscore (M)}⁹⁶=136;

[0511] {overscore (M)}⁴=16.

[0512] At the end of the calculation of this Montgomery productaccording to the method of FIG. 2 the following result is obtained:

{overscore (C)}:=MonPro(136, 16)=91

[0513] The loop of operations 120 of the method of FIG. 7 is onlyexecuted one single time since the initial value of the index i is 0.

[0514] At the end of the execution of the loop of operations 120, thestep 122 is performed. It consists of carrying out the followingoperation:

{overscore (C)}:=MonPro({overscore (C)}, 1)

[0515] where:

1. Method of processing the calculation of a Montgomery product on thebasis of the high-radix Montgomery method, the said method beingimplemented on computing hardware (150; 200) formed from a set ofelectronic components comprising at least one carry-save adder (157,158; 230, 232), the said method comprising a loop of operations, byreiteration of successive operations carried out by the said computinghardware (150; 200) comprising at least: a first arithmetic operation(24, 76) of addition of a value of one of several first products,denoted {overscore (a)}_(i).{overscore (b)} and a value of a variable,denoted u; a second arithmetic operation (28; 80) of addition of a valueof one of several second products, denoted m.n, and a value of the saidvariable u, characterised in that it consists of: delivering, at theinput of the said at least one carry-save adder, the value of thevariable u in the form of a carry-save ordered pair and the said valueof one of several products, denoted {overscore (a)}_(i).{overscore (b)},m.n respectively, in order to perform the said first and secondarithmetic addition operations and in order to obtain at the output theresult of respectively the first and the second arithmetic additionoperations in the form of a carry-save ordered pair, allocating to thevalue of the variable u the result obtained at the output to the said atleast one carry-save adder, and repeating the delivery and allocationoperations for each of the said iterations.
 2. Method as claimed inclaim 1 including in the loop of operations a third operation ofdivision (30; 76) of the variable u by a power of 2, denoted 2^(ω) whereω is the radix, according to a third relationship u:${u:=\frac{u}{2^{\omega}}},$

characterised in that the variable u is registered in the form of acarry-save ordered pair formed by two variables, denoted C and S, forperforming operations of the loop (72), and that the third operation ofdivision of the variable u in the form of a carry-save ordered pair iscarried out in two steps, namely: a preliminary step (82) of calculationand storage of a carry digit, denoted R_(e), which is at risk of beinglost by the division of each variable C and S by the power of 2; a stepof division (76) of each variable C and S by the power of
 2. 3. Methodas claimed in claim 2, characterised in that the preliminary step (82)of calculation of the carry digit R_(e) comprises the operation ofadding in a conventional manner ω least significant bits of the variableC, denoted C₀, to ω least significant bits of the variable S, denotedS₀, according to a fourth relationship R_(e):=C₀+S₀.
 4. Method asclaimed in claim 3, characterised in that a recombination (78, 84) of uon the basis of the variables C and S of the carry-save ordered pair andof the carry digit R_(e)comprises the operation of shifting to the rightby ω bits the carry digit R_(e) and in a conventional manner adding theresult obtained to the variables C and S according to a fifthrelationship u:=C+S+R_(e)/2^(ω).
 5. Method as claimed in any one ofclaims 2 to 4, characterised in that it comprises at the end ofperforming the loop of operations (72): a step of recombination (84) ofthe variable u on the basis of at least the values of the variables Cand S of the carry-save ordered pair calculated during the performanceof the loop of operations, and a step of reduction (86) of the variableu according to a sixth relationship u:=u−n, where n is a modulus, thesaid steps of recombination and of reduction of the variable uoverlapping in such a way as to speed up the time required to performthem.
 6. Method as claimed in any one of the preceding claims,characterised in that the radix ω is equal to 4 bits in order tooptimise the time required for performing the calculation of aMontgomery product on input variables of the Montgomery product encodedon 512 or 1024 bits.
 7. Method as claimed in any one of the precedingclaims, characterised in that the first products {overscore(a)}_(i).{overscore (b)} are pre-calculated before performing the loopof operations (72).
 8. Method as claimed in any one of the precedingclaims, characterised in that the second products m.n are pre-calculatedbefore performing the loop of operations (72).
 9. Method of speeding upthe time required to perform the calculation of a first and a secondMontgomery product by applying for each product a method as claimed inany one of claims 1 to 8, characterised in that it includes at least onefirst step during which the first addition operation (76) for the firstproduct is carried out at the same time as the second addition operation(80) for the second product.
 10. Method as claimed in claim 9,characterised in that it comprises at least a second step shifted intime with respect to the first, during which the second additionoperation (80) for the first product is carried out at the same time asthe first addition operation (76) for the second product.
 11. Method asclaimed in claim 9 or 10, characterised in that it comprises at the endof performing the loop of operations (72): a step of recombination (84)then of reduction (86) for the first product performed first; and then,a step of recombination (84) then of reduction (86) for the secondproduct performed second.
 12. Method as claimed in one of claims 9 to11, characterised in that one of the input variables of the firstMontgomery product performed first is made up of the least significantbits of a variable, and one of the input variables of the secondMontgomery product performed second is made up of the most significantbits of this same variable.
 13. Method of speeding up the time requiredto perform the calculation of a modular multiplication by applying amethod implementing Montgomery products, characterised in that thecalculation of the Montgomery products is carried out by applying atleast one of the methods as claimed in at least one of claims 1 to 12.14. Method as claimed in claim 13, characterised in that the said methodimplementing Montgomery products is the Montgomery method.
 15. Method ofspeeding up the time required to perform the calculation of a modularexponentiation by applying a method implementing modularmultiplications, characterised in that the calculation of the modularmultiplications is carried out by applying a method as claimed in claim13 or
 14. 16. Method as claimed in claim 15, characterised in that thesaid method implementing modular multiplications is the m-ary methodwith a word size of r bits.
 17. Method as claimed in claim 16,characterised in that the word size r of the m-ary method is equal to 5bits in order to speed up the time for performing the m-ary method wheninput variables of the modular exponentiation calculation are encoded on512 or 1024 bits.
 18. Method as claimed in claim 16 or 17, characterisedin that the second products m.n are pre-calculated before applying them-ary method.
 19. Method as claimed in claim 15, characterised in thatthe said method implementing modular multiplications is the Chineseremainders method.
 20. Method of speeding up the time required forperforming a first modular exponentiation calculation by applying amethod implementing second modular exponentiations, characterised inthat the second modular exponentiations are carried out by applying amethod as claimed in one of claims 15 to
 19. 21. Method as claimed inclaim 20, characterised in that the said method implementing secondmodular exponentiations is the Chinese remainders method.
 22. Method asclaimed in any one of the preceding claims, characterised in that it isapplied to numbers encoded on more than 320 bits.
 23. Computer programmecomprising programme code instructions for performing certain steps ofthe method as claimed in any one of claims 13 to 21 when the saidprogramme is executed on principal computing means (201) associated withthe said computing hardware (150; 200).
 24. System for processing thecalculation of a Montgomery product on the basis of the high-radixMontgomery method, the said system including computing hardware (150;200) formed from a set of electronic components, the said processingcomprising a loop of operations, by reiteration of successive operationscarried out by the said computing hardware (150; 200) comprising: afirst arithmetic operation (24, 76) of addition of a value of one ofseveral first products, denoted {overscore (a)}_(i).{overscore (b)} anda value of a variable, denoted u; a second arithmetic operation (28; 80)of addition of a value of one of several second products, denoted m.n,and a value of the said variable u, characterised in that the saidcomputing hardware (150; 200) includes at least: one carry-save adderadapted to receive as input the variable u in the form of a carry-saveordered pair and the said value of one of several products, denoted{overscore (a)}_(i).{overscore (b)}, m.n respectively, and to deliver atthe output the result of respectively the first and the secondarithmetic addition operations in the form of a carry-save ordered pair,means for allocating to the value of the variable u the result obtainedat the output of the said at least one carry-save adder.
 25. System asclaimed in claim 24, characterised in that the means for effecting thefirst and the second addition operations include at least one firstcarry-save adder (157; 230) adapted to carry out the first additionoperation and a second carry-save adder (158; 232) adapted to carry outthe second addition operation.
 26. System as claimed in one of claims 24to 25, including conventional means for carrying out a third operationof division of the variable u by a power of 2, denoted 2^(ω), where ω isthe radix, according to a third relationship${u:=\frac{u}{2^{\omega}}},$

characterised in that it includes means for storing the variable u inthe form of a carry-save ordered pair formed by two variables, denoted Cand S, and means for carrying out the third operation of division of thevariable u in the form of a carry-save ordered pair comprising: meansfor calculation and storage of a carry digit, denoted R_(e), which is atrisk of being lost by the division of each variable C and S by the powerof 2; means for division of each variable C and S by the power of
 2. 27.System as claimed in claim 26, characterised in that the means forcalculation and storage of the carry digit R_(e) include means forconventional addition of the ω least significant bits of the variable C,denoted C₀, to the ω least significant bits of the variable S, denotedS₀, according to a fourth relationship R_(e):=C₀+S₀.
 28. System asclaimed in any one of claims 24 to 29, characterised in that itcomprises: means (162, 236) for recombination of the variable u at leaston the basis of the values of the variables C and S of the carry-saveordered pair, means (166; 238) for reduction of the variable u, the saidmeans for recombination of the variable u and the said means forreduction being connected to one another in such a way that operationthereof overlaps under the control of the control means (156; 214). 29.System as claimed in any one of claims 24 to 30, characterised in thatthe radix ω is equal to 4 bits in order to optimise the time required toperform a Montgomery product calculation on input variables of theMontgomery product encoded on 512 or 1024 bits.
 30. System as claimed inany one of claims 24 to 31, characterised in that it includes means(164, 160; 216, 214, 218, 222) for pre-calculation of the first products{overscore (a)}₁.{overscore (b)}.
 31. System as claimed in any one ofclaims 24 to 31, characterised in that it includes means (164, 160; 216,214, 218, 222) for pre-calculation of the second products m.n. 32.System as claimed in claim 30 or 31, characterised in that the saidmeans for pre-calculation of the first and/or the second productsinclude a conventional adder (160; 220, 222).
 33. System for speeding upthe time required to perform the calculation of a first and a secondMontgomery product, characterised in that it includes two carry-saveadders (230, 232) which are activated simultaneously.
 34. System asclaimed in claim 33, characterised in that it includes a single means(162) for recombining the variable u on the basis of at least the valuesof the variables C and S of the carry-save ordered pair, connected tothe input of a single means (166) for reduction of the variable u. 35.System for speeding up the time required to perform a modularmultiplication calculation by a method implementing Montgomery products,the said Montgomery product calculations being performed on computinghardware (150; 200), characterised in that it includes at least onesystem (150; 202) for speeding up the time required to perform thecalculation of the Montgomery products as claimed in one of claims 24 to34.
 36. System for speeding up the time required to perform a modularmultiplication calculation by the Montgomery method implementingMontgomery products on computing hardware (150; 200), characterised inthat it includes at least one system (150; 202) for speeding up the timerequired to perform the calculation of the Montgomery products asclaimed in one of claims 24 to
 34. 37. System for speeding up the timerequired to perform a modular exponentiation calculation by a methodimplementing modular multiplications, characterised in that it includesat least one system (150; 200) for speeding up the time required toperform the calculation of the modular multiplications as claimed inclaim 35 or
 36. 38. System for speeding up the time required to performa modular exponentiation calculation by the m-ary method with a wordsize of r bits implementing modular multiplications, characterised inthat it includes at least one system (150; 200) for speeding up the timerequired to perform the calculation of the modular multiplications asclaimed in claim 35 or
 36. 39. System as claimed in claim 38,characterised in that it includes at least one register (242, 24) forshifting 5 bits to the left in order to speed up the performance of them-ary method with a word size of r bits of the m-ary method equal to 5bits.
 40. System for speeding up the time required to perform a modularexponentiation calculation by the Chinese remainders method implementingmodular multiplications, characterised in that it includes at least onesystem (150; 200) for speeding up the time required to perform themodular multiplication calculation as claimed in claim 37 or
 39. 41.System for speeding up the time required to perform the calculation of afirst modular exponentiation by a method implementing second modularexponentiations, characterised in that it includes at least one system(150; 200) for speeding up the time required to perform the calculationof the second modular exponentiations as claimed in any one of claims 37to
 40. 42. System for speeding up the time required to perform at leastthe calculation of at least a first modular exponentiation by theChinese remainders method which itself implements second modularexponentiations, characterised in that it includes at least one system(150; 200) for speeding up the time required to perform the calculationof the second modular exponentiations as claimed in any one of claims 39to
 41. 43. Electronic component, characterised in that it includes atleast one system as claimed in one of claims 24 to
 42. 44. Electroniccomponent as claimed in claim 45, characterised in that it is formedwith at least one FPGA.
 45. Electronic card, characterised in that itincludes at least one system as claimed in one of claims 24 to
 44. 46.Electronic card as claimed in claim 45, characterised in that itconforms to the PCI standard.
 47. Machine characterised in that it isassociated with at least one system as claimed in one of claims 24 to46.
 48. Method of processing the calculation of a first modularexponentiation, denoted M^(E) mod n, where M is the input message, E isthe exponent and n is the modulus, with the aid of principal computingmeans (201) formed by a computer, characterised in that it comprises thefollowing steps: a first step of writing the first modularexponentiation at the input of the principal computing means, a secondstep of activating the said first modular exponentiation on theprincipal computing means (201) of means for processing according to theChinese remainders method in order to obtain at the output two secondmodular exponentiations to be processed, a third step of activatingmeans for processing according to the m-ary method of each of the secondmodular exponentiations, the m-ary method implementing modularmultiplications, steps of activating means for processing according tothe Montgomery method of each of the said the modular multiplications ofthe m-ary method.
 49. Method as claimed in claim 48, characterised inthat the input variables are natural integers encoded on more than 320bits.
 50. Method as claimed in claim 48 or 49, characterised in that theword size r of the m-ary method is equal to 5 bits in order to speed upthe time required to perform the m-ary method when the input variablesof the calculation of the modular exponentiation are encoded on 512 or1024 bits.
 51. Method as claimed in any one of claims 48 to 50,characterised in that the calculations of the second modularexponentiations are carried out substantially in parallel.
 52. Method asclaimed in any one of claims 48 to 51, characterised in that theMontgomery products are calculated using the high-radix Montgomerymethod.
 53. Method as claimed in claim 52, characterised in that thehigh-radix Montgomery method is implemented in accordance with one ofthe methods as claimed in any one of claims 1 to
 9. 54. Computerprogramme comprising programme code instructions for performing certainsteps of the method as claimed in any one of claims 48 to 52 when thesaid programme is executed on the principal computing means (201).